privacy vulnerability
Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation
Zhang, Jiankun, Zeng, Shenglai, Ren, Jie, Zheng, Tianqi, Liu, Hui, Tang, Xianfeng, Liu, Hui, Chang, Yi
Multimodal Retrieval-Augmented Generation (MRAG) systems enhance LMMs by integrating external multimodal databases, but introduce unexplored privacy vulnerabilities. While text-based RAG privacy risks have been studied, multimodal data presents unique challenges. We provide the first systematic analysis of MRAG privacy vulnerabilities across vision-language and speech-language modalities. Using a novel compositional structured prompt attack in a black-box setting, we demonstrate how attackers can extract private information by manipulating queries. Our experiments reveal that LMMs can both directly generate outputs resembling retrieved content and produce descriptions that indirectly expose sensitive information, highlighting the urgent need for robust privacy-preserving MRAG techniques.
On the Privacy-Preserving Properties of Spiking Neural Networks with Unique Surrogate Gradients and Quantization Levels
Moshruba, Ayana, Snyder, Shay, Poursiami, Hamed, Parsa, Maryam
As machine learning models increasingly process sensitive data, understanding their vulnerability to privacy attacks is vital. Membership inference attacks (MIAs) exploit model responses to infer whether specific data points were used during training, posing a significant privacy risk. Prior research suggests that spiking neural networks (SNNs), which rely on event-driven computation and discrete spike-based encoding, exhibit greater resilience to MIAs than artificial neural networks (ANNs). This resilience stems from their non-differentiable activations and inherent stochasticity, which obscure the correlation between model responses and individual training samples. To enhance privacy in SNNs, we explore two techniques: quantization and surrogate gradients. Quantization, which reduces precision to limit information leakage, has improved privacy in ANNs. Given SNNs' sparse and irregular activations, quantization may further disrupt the activation patterns exploited by MIAs. We assess the vulnerability of SNNs and ANNs under weight and activation quantization across multiple datasets, using the attack model's receiver operating characteristic (ROC) curve area under the curve (AUC) metric, where lower values indicate stronger privacy, and evaluate the privacy-accuracy trade-off. Our findings show that quantization enhances privacy in both architectures with minimal performance loss, though full-precision SNNs remain more resilient than quantized ANNs. Additionally, we examine the impact of surrogate gradients on privacy in SNNs. Among five evaluated gradients, spike rate escape provides the best privacy-accuracy trade-off, while arctangent increases vulnerability to MIAs. These results reinforce SNNs' inherent privacy advantages and demonstrate that quantization and surrogate gradient selection significantly influence privacy-accuracy trade-offs in SNNs.
Technical Report for the Forgotten-by-Design Project: Targeted Obfuscation for Machine Learning
Brännvall, Rickard, Adomaitis, Laurynas, Görnerup, Olof, Sedrati, Anass
The right to privacy, enshrined in various human rights declarations, faces new challenges in the age of artificial intelligence (AI). This paper explores the concept of the Right to be Forgotten (RTBF) within AI systems, contrasting it with traditional data erasure methods. We introduce Forgotten by Design, a proactive approach to privacy preservation that integrates instance-specific obfuscation techniques during the AI model training process. Unlike machine unlearning, which modifies models post-training, our method prevents sensitive data from being embedded in the first place. Using the LIRA membership inference attack, we identify vulnerable data points and propose defenses that combine additive gradient noise and weighting schemes. Our experiments on the CIFAR-10 dataset demonstrate that our techniques reduce privacy risks by at least an order of magnitude while maintaining model accuracy (at 95% significance). Additionally, we present visualization methods for the privacy-utility trade-off, providing a clear framework for balancing privacy risk and model accuracy. This work contributes to the development of privacy-preserving AI systems that align with human cognitive processes of motivated forgetting, offering a robust framework for safeguarding sensitive information and ensuring compliance with privacy regulations.
Race and Privacy in Broadcast Police Communications
Venkit, Pranav Narayanan, Graziul, Christopher, Goodman, Miranda Ardith, Kenny, Samantha Nicole, Wilson, Shomir
Radios are essential for the operations of modern police departments, and they function as both a collaborative communication technology and a sociotechnical system. However, little prior research has examined their usage or their connections to individual privacy and the role of race in policing, two growing topics of concern in the US. As a case study, we examine the Chicago Police Department's (CPD's) use of broadcast police communications (BPC) to coordinate the activity of law enforcement officers (LEOs) in the city. From a recently assembled archive of 80,775 hours of BPC associated with CPD operations, we analyze text transcripts of radio transmissions broadcast 9:00 AM to 5:00 PM on August 10th, 2018 in one majority Black, one majority white, and one majority Hispanic area of the city (24 hours of audio) to explore three research questions: (1) Do BPC reflect reported racial disparities in policing? (2) How and when is gender, race/ethnicity, and age mentioned in BPC? (3) To what extent do BPC include sensitive information, and who is put at most risk by this practice? (4) To what extent can large language models (LLMs) heighten this risk? We explore the vocabulary and speech acts used by police in BPC, comparing mentions of personal characteristics to local demographics, the personal information shared over BPC, and the privacy concerns that it poses. Analysis indicates (a) policing professionals in the city of Chicago exhibit disproportionate attention to Black members of the public regardless of context, (b) sociodemographic characteristics like gender, race/ethnicity, and age are primarily mentioned in BPC about event information, and (c) disproportionate attention introduces disproportionate privacy risks for Black members of the public.